Skip to content

Add function to compute statistics about a scheduled product #6589

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Martchus
Copy link
Contributor

This is to be able to add hook script support for scheduled products because for this we need to be able to determine whether all jobs of a scheduled product are done (to execute the hook script in this case).

The hook script is supposed to do automatic approval/disapproval of changes. Hence it needs not only to know whether all jobs are done but also the state/result they ended up with. So this function returns these kinds of statistics instead of a binary "all done". Maybe it will need to be changed to return more high level statistics (like "all passed"), though.

Related ticket: https://progress.opensuse.org/issues/184690


Just a draft because it still needs tests and it also makes no sense to merge this without being used. However, I tested this on OSD with production data and scheduled products that have many jobs (up to 260) and a few restart chains (of depth up to 6) and it was very fast.

That's good because it means the hardest part of this feature is solved. Now I just need to use this in the Minion job where we already execute job done hooks to check whether we can execute a scheduled product hook and execute that hook in the same way we execute job done hooks. I would read the script name/path from the scheduled product settings. I would pass the scheduled product ID as first argument and the statistics as JSON as second argument. (As mentioned in the commit message statistics could be a bit more high-level than what I currently have.)

Then I can write the hook script. I think this could be done in Python to be able to use the osc Python libraries directly - just like qem-bot does. Or a simple shell script that invokes the osc CLI tool.

So I guess that would be the plan for this feature and it is maybe useful in the future for more than just the increment approval.

@Martchus Martchus force-pushed the scheduled-product-hook branch from 7f78922 to d63ea02 Compare July 15, 2025 16:04
Copy link

codecov bot commented Jul 15, 2025

Codecov Report

Attention: Patch coverage is 16.66667% with 5 lines in your changes missing coverage. Please review.

Project coverage is 99.10%. Comparing base (fe96ee5) to head (76b7951).
Report is 6 commits behind head on master.

Files with missing lines Patch % Lines
lib/OpenQA/Schema/Result/ScheduledProducts.pm 16.66% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6589      +/-   ##
==========================================
- Coverage   99.11%   99.10%   -0.02%     
==========================================
  Files         399      399              
  Lines       40717    40723       +6     
==========================================
+ Hits        40358    40359       +1     
- Misses        359      364       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@okurz
Copy link
Member

okurz commented Jul 16, 2025

So I guess that would be the plan for this feature and it is maybe useful in the future for more than just the increment approval.

Have you considered to provide the information from this function in an amqp message so that external tooling can react on such "build is done"?

@Martchus
Copy link
Contributor Author

I think emitting an amqp message would make this more challenging in two ways:

  1. We still need an alternative approach to cover the case of a missed amqp message. So it would be work in addition to some other approach like an API route someone can call periodically.
  2. I assume you are talking about a generic event - so openQA would emit such an amqp message for every scheduled product. Then I'm not sure whether this scales well enough. The query is fast but we need to do this query for each job that has finished in every scheduled product to determine whether the whole scheduled product is finished now. It is probably one thing to do this for selected scheduled products but another thing to do it for all. So if a scheduled product has e.g. 200 jobs we would need to check whether the scheduled product is done 200 times involving a non-trivial query with recursion that checks all the 200 jobs. That makes the database go through 40000 jobs for this single scheduled product. On OSD we have lots of scheduled products and some are even bigger than 200 jobs. I also haven't even been taking other overhead into account. So I am not confident that this quadratic-scaling algorithm will be good enough for the generic case. (We could maybe optimize this with an early return so the number of checks would be halved on average but this would further complicate things and not change the quadratic nature of this.)
  3. I'm not sure whether we have resolved the issue of allowing amqp traffic on GitLab or Open Platform.

We should probably only solve the problem at hand for now and in the simplest way. Point 1 let me think of an even simpler approach than a hook script: Instead of running a hook script on the OSD host we could also just provide an API endpoint that returns these statistics. Then some bot can query this route once per hour (probably once every 6 hours would be enough) for relevant scheduled products if there is a pending "increment" to be approved.

This would be efficient and easy to implement because:

  • If there is no increment to be approved there is no additional overhead at all within openQA. Only the bot has to check whether there's an increment and return early if not.
  • The query would only be done for scheduled products we are interested in. The same counts for my hook script approach of course (but not the next point).
  • The bot could directly focus on the most recent relevant scheduled products. So in case a product is scheduled again (e.g. after amending settings) we would not deal with the old scheduled product anymore at all.
  • The query wouldn't need to be done after each individual job is done but only once in a certain time interval. Hence no bad quadratic scaling.
  • We don't need to check after each and every job is done whether that job (or the original job) was scheduled via a certain scheduled product. Although to be fair, we already do this anyway for the webhook-based CI integration and it is just one additional query per job.
  • The bot could just be another sub command of qem-bot. This way we could use the existing repo/helpers from qem-bot (but still wouldn't need to care about its existing logic as we would just add a new sub-command). We could also re-use its execution environment where everything is already in place to access openQA and IBS. I think especially the last point the reduce the effort quite a lot.
  • We wouldn't need to introduce an additional variable to specify a hook script in scheduled product settings. (Although this wouldn't be a big deal.)
  • The script wouldn't need to run on the OSD host but on GitLab so we avoid overloading OSD with more and more things.

This approach still has one challenge, though. To find the most recent relevant scheduled product for each arch we need a query like this:

select max(id) as id, arch from scheduled_products where distri = 'sle' and version = '15.99' and flavor = 'Online-Increments' group by arch;

This is currently not very efficient because we don't have an index on the scheduled products table but it has almost 3 million rows. (The query seems to be still fast enough for now and we only have to do it e.g. once per hour.)

I think actually all approaches have this challenge because we always have to determine whether the scheduled product we are currently dealing with (e.g. in its hook script or when receiving an amqp event about it) is still the most recent. With the API approach we would at least never have to invoke this query just to find out that the scheduled product we are currently dealing with is not relevant anymore. (And by the way, just when I'm writing about this I find that this is really not a fictional aspect to care about as Richard has just re-triggered the scheduled product, see https://suse.slack.com/archives/C08DC2SHABV/p1752637433485059?thread_ts=1750847709.352269&cid=C08DC2SHABV.)

Note that if we have an API as I mentioned we can still think of emitting an amqp message in addition and for specific products. (In addition as per point 1 and only for specific scheduled products to avoid point 2.)

This is to be able to add hook script support for scheduled products
because for this we need to be able to determine whether all jobs of a
scheduled product are done (to execute the hook script in this case).

The hook script is supposed to do automatic approval/disapproval of
changes. Hence it needs not only to know whether all jobs are done but also
the state/result they ended up with. So this function returns these kinds
of statistics instead of a binary "all done". Maybe it will need to be
changed to return more high level statistics (like "all passed"), though.

Related ticket: https://progress.opensuse.org/issues/184690
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants